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We consider a one-dimensional diffusion process (Xt) which is observed at n + 1 discrete times 
with regular sampling interval A. Assuming that (Xt) is strictly stationary, we propose non- 
parametric estimators of the drift and diffusion coefficients obtained by a penalized least squares 
approach. Our estimators belong to a finite-dimensional function space whose dimension is se- 
lected by a data-driven method. We provide non-asymptotic risk bounds for the estimators. 
When the sampling interval tends to zero while the number of observations and the length of 
the observation time interval tend to infinity, we show that our estimators reach the minimax 
optimal rates of convergence. Numerical results based on exact simulations of diffusion processes 
are given for several examples of models and illustrate the qualities of our estimation algorithms. 
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1. Introduction 

In this paper, we consider the following problem. Let {Xt)t>o be a one-dimensional 
diffusion process with dynamics described by the stochastic differential equation: 

dXt^b{Xt)dt + <7iXt)dWu t>0, Xo^v, (1) 

where (Wt) is a standard Brownian motion and 77 is a random variable independent of 
(Wt). Assuming that the process is strictly stationary (and ergodic), and that a discrete 
observation {XkA)i<k<n+i of the sample path is available, we wish to construct non- 
parametric estimators of the drift function b and the (square of the) diffusion coefhcient 
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Our aim is twofold: to construct estimators that have optimal asymptotic properties 
and that can be implemented through feasible algorithms. Our asymptotic framework is 
such that the sampling interval A = A„ tends to zero while rtA„ tends to infinity as n 
tends to infinity. Nevertheless, the risk bounds obtained below are non-asymptotic in the 
sense that they are explicitly given as functions of A or l/(nA) and fixed constants. 

Nonparametric estimation of the coefficients of diffusion processes has been widely 
investigated in recent decades. The first estimators proposed and studied were based 
on a continuous time observation of the sample path. Asymptotic results were given for 
ergodic models as the length of the observation time interval tends to infinity: see, for 
instance, the reference paper by Banon [2], followed by more recent works by Prakasa 
Rao [30], Spokoiny [31], Kutoyants [28] or Dalalyan [18]. 

Then discrete sampling of observations was considered, with different asymptotic 
frameworks, implying different statistical strategies. It is now classical to distinguish 
between low-frequency and high-frequency data. In the former case, observations are 
taken at regularly spaced instants with fixed sampling interval A and the asymptotic 
framework is that the number of observations tends to infinity. Only ergodic models 
are usually considered. Parametric estimation in this context was studied by Bibby and 
S0rensen [11], Kessler and S0rensen [27]; see also Bibby et al. [12]. A nonparametric ap- 
proach using spectral methods was investigated in Gobet et al. [24] , where non-standard 
nonparametric rates were exhibited. 

In high-frequency data, the sampling interval A = A„ between two successive observa- 
tions is assumed to tend to zero as the number of observations n tends to infinity. Taking 
A„ = 1/n, so that the length of the observation time interval nA„ = 1 is fixed, can only 
lead to estimating the diffusion coefficient consistently. This was done by Hoffmann [25] 
who generalized results by Jacod [26], Florens-Zmirou [21] and Gcnon-Catalot et al. [22]. 

Now, estimating both drift and diffusion coefficients requires that the sampling interval 
A„ tends to zero while nA„ tends to infinity. For ergodic diffusion models, Hoffmann 
[25] proposes nonparametric estimators using projections on wavelet bases together with 
adaptive procedures. He exhibits minimax rates and shows that his estimators automati- 
cally reach these optimal rates up to logarithmic factors. Hoffmann's estimators are based 
on computations of some random times which make them difficult to implement. 

In this paper, we propose simple nonparametric estimators based on a penalized mean 
square approach. The method is investigated in detail in Comtc and Rozenholc [16, 17] 
for regression models. We adapt it here to the case of discretized diffusion models. The 
estimators are chosen to belong to finite-dimensional spaces that include trigonometric, 
wavelet-generated and piecewise polynomial spaces. The space dimension is chosen by a 
data-driven method using a penalization device. Due to the construction of our estima- 
tors, we measure the risk of an estimator / of / (with / = h,a^) by IE(||/ — /Ufi), where 
11/ - /lln = ELi(/(^feA) - f{XkA)?. We give bounds for this risk (see Theorems 1 
and 2). An examination of these bounds as A = A„ — > and nA„ — > -|-oo shows that our 
estimators achieve the optimal nonparametric asymptotic rates obtained in Hoffmann 
[25] without logarithmic loss (when the unknown functions belong to Besov balls) . Then 
we proceed to numerical implementation on simulated data for several examples of mod- 
els. We emphasize that our simulation method for diffusion processes is not based on 
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approximations (like Euler schemes). Instead, we use the exact retrospective simulation 
method described in Beskos et al. [10] and Beskos and Roberts [9]. Then we apply the 
algorithms developed in Comte and Rozenholc [16, 17] for nonparametric estimation us- 
ing piecewise polynomials. The results are convincing even when some of the theoretical 
assumptions are not fulfilled. 

The paper is organized as follows. In Section 2 we describe our framework (model, 
assumptions and spaces of approximation). Section 3 is devoted to drift estimation, and 
Section 4 to diffusion coefficient estimation. In Section 5 we study examples and present 
numerical simulation results that illustrate the performance of estimators. Section 6 
contains proofs. In Section 7 a technical lemma is proved. 

2. Framework and assumptions 
2.1. Model assumptions 

Let {Xt)t>o be a solution of (1) and assume that n + 1 observations X^a, k ^ I, . . . ,n+l, 
with sampling interval A are available. Throughout the paper, we assume that A = A„ 
tends to and nA„ tends to infinity as n tends to infinity. To simplify notation, we write 
A without the subscript n. Nevertheless, when speaking of constants, we mean quantities 
that depend neither on n nor on A. We wish to estimate the drift function b and the 
diffusion coefScient when X is stationary and geometrically /^-mixing. To this end, 
we consider the following assumptions: 

Assumption 1. 

(i) b e Ci(M) and there exists 7 > such that, for all x €R,\b'{x)\ + {xl'^). 

(ii) There exists b^ such that, for all x, \h{x)\ < 6o(l + |a;|). 

(iii) There exist d > 0,r > and R > such that, for all \x\ > R, sgx\{x)b{x) < — r|a;|''. 
Assumption 2. 

(i) There exist CTq and a\ such that, for all x, < CTq < <y'^{x) < and there exists L 
such that, for all {x, y) S ffi^, \(t{x) — (j{y)\ < L\x — y\^^^- 

(ii) a G C^(M) and there exists 7 > such that, for all a; G M, |o-'(a;)| + |cr"(a::)| < 



Under Assumptions 1 and 2, equation (1) has a unique strong solution. Note that 
Assumption 2(ii) is only used for the estimation of and not for b. Elementary compu- 
tations show that the scale density 
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satisfies s{x) dx = +00 = /^°° s{x) dx, and the speed density m{x) = 1/ (cr^(a;)s(a;)) 
satisfies m{x) dx = M < +00. Hence, model (1) admits a unique invariant probabil- 
ity tt{x) dx with Tr{x) = M~^m{x). Now we assume the following: 

Assumption 3. Xq ~ 77 has distribution tt. 

Under the additional Assumption 3, {Xt) is strictly stationary and ergodic. Moreover, 
it follows from Proposition 1 in Pardoux and Veretcnnikov [29] that there exist constants 
K >{), V >{) and ^ > such that 

E(cxp(i^|Xo|)) < +00 and l3x{t) < Ke''^^ (2) 

where f3x{t) denotes the /3-mixing coefficient of (Xt) and is given by 

/+00 
7r{x)dx\\Pt{x,dx') - TT{x')dx'\\Ty. 
-00 

The norm || • ||tv is the total variation norm and Ft denotes the transition probability. In 
particular, Xq has moments of any (positive) order. Now, Assumption l(i) ensures that, 
for alH > 0, h> and k>l, there exists c = c{k, 7) such that 

e( sup \b{Xs)~b{Xtt\J't)<ch''^^{l + \Xtn, 
\se[t,t+h] / 

where Tt = a{Xs,s < t); for example, Gloter ([23], Proposition A). Thus, taking expec- 
tations, there exists c' such that 

e( sup \biX,)-biXt)A<c'h''/^. (3) 
\se[t,t+h] J 

The functions b and are estimated only on a compact set A. For simplicity and without 
loss of generality, we assume from now on that 

A =[0,1]. (4) 

It follows from Assumptions 1, 2(i) and 3 that the stationary density tt is bounded from 
below and above on any compact subset of R, and we denote by ttq, tti two positive real 
numbers such that 

< TTo < 7r(a;) < TTi V2;GA=[0,1]. (5) 

2.2. Spaces of approximation: piecewise polynomials 

We aim to estimate the functions b and of model (1) on [0, 1] using a data-driven 
procedure. For that purpose, we consider families of finite-dimensional linear subspaces 
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of L^([0, 1]) and compute for each space an associated least squares estimator. Then an 
adaptive procedure chooses among the resulting collection of estimators the 'best' one, 
in a sense that will be specified later, through a penalization device. 

Several possible collections of spaces are available and discussed in Section 2.3. Now, 
to be consistent with the algorithm implemented in Section 5, we focus on a specific 
collection, namely the collection of dyadic regular piecewise polynomial spaces, henceforth 
denoted by [DP]. 

We fix an integer r > 0. Let p>0 also be an integer. On each subintcrval ~ [{j ~ 
l)/2^,j/2P], j — 1, . . . , 2^, consider r + 1 polynomials of degree £, ipj,e{x), £ = 0,1, . . . ,r, 
and set ifj^i(x) = outside Ij. The space Sm, rn = {p,r), is defined as generated by the 
Dm = 2P{r + 1) functions A function t in Sm may be written as 



j = l 1=0 

The collection of spaces [SmiTn S A^„) is such that 

Mn^{m^ {p, r),p G N, r G {0, 1, . . . , w}, 2^'(w + 1) < N^}. (6) 

In other words. Dm < Nn, where Nn < n. The maximal dimension iV„ is subject to 
additional constraints given below. The role of Nn is to bound all dimensions Dm, even 
when m is random. In practice, it corresponds to the maximal number of coefficients to 
estimate. Thus it must not be too large. 

More concretely, consider the orthogonal collection in L^([— 1,1]) of Legendre poly- 
nomials {Qi,£ > 0), where the degree of Qe is equal to £, generating L^([— 1,1]); see 
Abramowitz and Stegun ([1], page 774). These satisfy |(5f(a;)| < 1, for all x G [—1,1], 
Qi{l) = 1 and J^^Qj{u) du = 2/ {2£+l). Then we set Pi{x) = {2£ + iy/^Qe{2x - 1) to 
obtain an orthonormal basis of L^([0, 1]). Finally, 



ip.^iix) = 2P^'-Pe{2Px - J + (x). 



j = l,...,2P,£ = 0,l,. 



The space Sm has dimension Dm — 2^(r+ 1), and its orthonormal basis described above 
satisfies 



2" r 

EE 

j=i e=o 



<Dm{r + l)<Dmir^e.^ + l). 



Hence, for all t G Sm, ||t||oo < (w + 1)^^^ DU^\\t\\, where \\tf = J^t^{x)dx, for t in 
L^([0, 1]), a property which is essential for the proofs. 



2.3. Other spaces of approximation 



From both theoretical and practical points of view, other spaces can be considered, such as 
the trigonometric spaces [T], where Sm is generated by {1, 2^/^ cos(27tja;), 2^/^ sin(27Tjx) 
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for j = 1, . . . , to}, has dimension Dm = 2m + 1 and m S Mn = {Ij ■ • ■ 7 ["■/2] — 1}; and the 
dyadic wavelet-generated spaces [W] with regularity r and compact support, as described, 
for example, in Daubechies [19], Donoho et al. [20] or Hoffmann [25]. 

The key properties that must be fulfilled to fit in our framework are the following: 

(7^1 ) Norm connection: {Sm)mi£Mn ^ collection of finite-dimensional linear sub- 
spaces of L^([0, 1]), with dimension dim(S'„j,) = Dm such that Dm < Nn < n, for 
all TO S A^n, and satisfying: 

There exists $0 > such that, for all to e A4n, for all t G 

Sm, \\t\\oo<^0DU^\\t\\. 

An orthonormal basis of Sm is denoted by ((/3a)agA„i, where |A„i| — Dm- It follows from 
Birge and Massart [13] that property (7) in the context of (Hi) is equivalent to: 



There exists $0 > such that 



AgA™ 



< %Dm. (8) 



Thus, for the collection [DP], (8) holds with <i>Q ~ rmax + 1- Moreover, for results con- 
cerning adaptive estimators, we need an additional assumption: 

(7^2) Nesting condition: {Sm)meM„ is a collection of models such that there exists a 
space denoted by iS„, belonging to the collection, with 5*^, C 5„ for all to G A4n- 
We denote by Nn the dimension of iS„: dim(<S„) = Nn (Vm G M-n, Dm < Nn). 

As far as possible below, we keep the notation general to allow extensions to spaces of 
approximation other than [DP] . 



3. Drift estimation 

3.1. Drift estimators: non-adaptive case 



Let 



n-A^^^^iil^^ and Z,^^U''''''\iX.)dW.. (9) 



kA 



The following standard regression-type decomposition holds: 



1 



{k+l)A 



YkA = biXkA) + ZkA + ^ / {h{Xs) - h{XkA)) As, 



kA 



where b{Xk/s,) is the main term, Z^a the noise term and the last term is a negligible 
residual. 



520 



F. Comte, V. Genon-Catalot and Y. Rozenholc 



Now, for Sm a space of the collection A^„ and for t S Sm, we consider the following 
regression contrast: 

1 " 

lnit)^-y[YkA-t{XkA)?. (10) 
k=l 

The estimator belonging to Sm is defined as 

&m = arg niin 7„(t). (11) 

A minimizer of jn in -5^, bm always exists but may not be unique. Indeed, in some 
common situations the minimization of 7^ over 5*^ leads to an affine space of solutions. 
Consequently, it becomes impossible to consider a classical L^-risk for the 'least squares 
estimator' of b in Sm- In contrast, the random R"-vector (&m(XA), ■ • ■ , ^m(-^nA))' is 
always uniquely defined. Indeed, let us denote by 11^ the orthogonal projection (with 
respect to the inner product of K") onto the subspace {(^(Xa), • . • ,t(-^nA))',i € Sm} of 
M". Then (6™(Xa), . . . , 6™(X„a))' = n,„y, where Y = (Ya, Y^a)' ■ This is the reason 
why we define the risk of bm by 



E 



-J2{bmiXkA)-b{XkA)V 



n 

k=l 



=mbm~b\\i), 



where 



\t\\i^-y^f{x,A). (12) 



n ^ 



Thus, our risk is the expectation of an empirical norm. Note that, for a deterministic 
function t, E(||t||^) = = / ^^(a:;) d7r(a;) where tt denotes the stationary law. In view 
of (5), the L^-norm, || • ||, and the L^(7r)-norm, || • Utt, are equivalent for A-supported 
functions. 



3.2. Risk of the non-adaptive drift estimator 

Using (9), (10) and (12), we have 

„ n 

7n(i) - 7«(&) = l|i - + - - i)iXkA)ZkA 

k=\ 

2 " /.(fc+l)A 

+ —rJ2ib-t){XkA) (b{Xs)-b{XkA))ds. 
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In view of this decomposition, we define the centred empirical process 

1 " 

M-t) = -Y.^iXkA)ZkA. (13) 



n 

k=l 



Now denote by bm the orthogonal projection of b onto Sm- By definition of bm, Jnibm) < 
7„(6„). So 7„(6m) - 7„(6) < 7«(&m) - 7«(fe)- This implies 

f-(fc+l)A 



— V(6„-6™)(XfcA) / (fe(^,)-6(XfeA))ds 



The fimctions bm and 6,„ being A-supported, we can cancel the terms ||61a<:||^ that 
appear in both sides of the inequality. This yields 

\\brn - blAWl < \\brn - bUWl + 2z^„(6™ - b^) 

+ -r V(fern-fom)(^fcA) / (6(X,) - 6(XfeA)) ds. (14) 

On the basis of this inequality, we obtain the following result. 

Proposition 1. Let A = A,i be such that A„ 0, nA„/ In^(n) — > +00 when n +00. 
Suppose that Assumptions 1, 2{i) and 3 hold and consider a space Sm in the collection 
[DP] with Nn = o(riA/ln^(n)) (Nn is defined in (Ti.2))- Then the estimator bm of b is 
such that 

mb,n bA\\l) < 7n,\\bm b^r + + K'A + (15) 

77.A nA 

where bA = blA o,nd K, K' and K" are positive constants. 

As a consequence, it is natural to select the dimension Dm that leads to the best 
compromise between the squared bias term — and the variance term of order 
Dm/{nA). 

To compare the result of Proposition 1 with the optimal nonparametric rates exhib- 
ited by Hoffmann [25], let us assume that bA belongs to a ball of some Besov space, 
bA e Sa,2,cx)([0, 1]), and that r+l>a. Then, for ||&yi||Q,2,oo < L, we have - &m||^ < 
C(a,L)£>-2". Thus, choosing Dm = (nA)i/(2a+i) ^ obtain 

m\bm. - bA\\l) < C{a, L)(nA)-2"/(2"+i) + K'A + (16) 

nA 

The first term (nA)~^"/(^"+^) is exactly the optimal nonparametric rate (see Hoffmann 
[25]). Moreover, under the standard condition A = o(l/(nA)), the last two terms in (15) 
are 0(l/(nA)), which is negligible with respect to (nA)~^"/(^"+^) . 
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Proposition 1 holds for the wavelet basis [W] under the same assumptions. For the 
trigonometric basis [T], the additional constraint iV„ < 0((nA)^/^/ ln(n)) is necessary. 
Hence, when working with these bases, if Ba £ ■Sq,2,oo([0, 1]) as above, the optimal rate 
is reached for the same choice for Dm, under the additional constraint that a > 1/2 for 
[T]. It is worth stressing that a > 1/2 automatically holds under Assumption 1. 

3.3. Adaptive drift estimator 

As a second step, we must ensure an automatic selection of Dm, which does not use any 
knowledge of b, and in particular which does not require a to be known. The standard 
selection is 

m = arg min [7„(6,„) + pen(m)], (17) 

with pen(TO) a penalty to be chosen appropriately. We denote by bm the resulting esti- 
mator and we need to determine pen(-) such that, ideally, 

Ei\U^bM\l)<C inf (\\bA - bmW + ^(^'(-^"))^- ) +K'A + ^, 

with C a constant which should not be too large. We almost achieve this aim. 

Theorem 1. Let A ~ A„ be such that A„ 0, nA„/ln^(n) +oo when n — > +oo. 
Suppose that Assumptions 1, 2{i) and 3 hold and consider the nested collection of models 
[DP] with maximal dimension 7V„ = o(nA/ln (n)). Let 

pcn(m)>Ka2^, (18) 

where k is a universal constant. Then the estimator bm of b with rh defined in (17) is 
such that 

n\\Wn-bA\\l)<C inf (||6,,„-6^f + pcn(m)) + A-'A+:^. (19) 

Some comments are in order. It is possible to choose pen(m) = K(T^Z?„i/(nA), but this 
is not what is done in practice. It is better to calibrate additional terms. This is explained 
in Section 5.2. The constant k, in the penalty is numerical and must be calibrated for the 
problem. Its value is usually adapted by intensive simulation experiments. This point is 
also discussed in Section 5.2. From (15), one would expect to obtain E((t^(Ao)) instead 
of a1 in (18): we do not know if this is the consequence of technical problems or if it is 
a structural result. Another important point is that a\ is unknown. In practice, we just 
replace it by a rough estimator (sec Section 5.2). 

From (19), we deduce that the adaptive estimator automatically realizes the bias- 
variance compromise: whenever 6^ belongs to some Besov ball (see (16)), if r + 1 > a 
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and nA^ = o(l), bm achieves the optimal corresponding nonparametric rate, with- 
out logarithmic loss, contrary to Hoffmann's adaptive estimator (see Hoffmann [25], 
page 159, Theorem 5). As mentioned above. Theorem 1 holds for the basis [W] and, 
if iV„ = o((nA)i/2/ln(n)), for [T] . 

4. Adaptive estimation of the diffusion coefficient 
4.1. Diffusion coefficient estimator: non-adaptive case 

To estimate on A= [0, 1] , we define 

1 " 

al^argmm%{t) with 7„(i) - - V[C/fcA - t(^fcA)]', (20) 

taS^ n. ^ — ^ 

and 



tes^ n ■ 

k=l 



^^^J2km^_x^^ (21) 

For diffusion coefficient estimation under our asymptotic framework, it is now well known 
that rates of convergence are faster than for drift estimation. This is the reason why the 
regression-type equation has to be more precise than for b. Let us set 

'>P = 2(j'ab+[{a'f + aa"]a'^. (22) 

Some computations using Ito's formula and Fubini's theorem lead to 

UkA = O-^(XfeA) + VfeA + -RfcA 

where Vua = + vf^ + vj:^^ , with 



Vt2 = I / ■ ■ ((fc + 1)A - s)a'{X,)a\X,) dW,, 



f Ak+l)A >. 2 ^(fc+l)A 

/ Ct{Xs)AWs \ - / tT2(X,)ds 

IJfeA J Jk\ 



(fc+l)A 
/cA 



and 



Rk. = ^ 



Ak+1)A 

T/fc'i^ = 26(XfeA) / <j{X,)dW, 

JkA 



Ak+1)A \2 2 fik+l)A Ak+1)A 

/ 6(X,)ds +-/ ibiX,)-b{XkA))ds a{X,)dW, 

JkA / ^ JkA JkA 



1 



(fe+l)A 



+ -/ [{k + l)A-s]^P{Xs)ds. 



kA 
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Obviously, the main noise term in the above decomposition must be , as wih be 
proved below. 

4.2. Risk of the non-adaptive estimator 

As for the drift, we write 

2 " 2 " 

7» W - 7n(a') - Ik' - + - V(^t' - t){XkA)VkA + - 'Ek' - t){XkA)RkA- 

n ^ — ' n ^ — ' 

k=l k=l 

We denote by cr^„ the orthogonal projection of cr^ on Sm and define 

1 " 

n ^-^ 

fc=i 

Again we use the fact that %{<yfn) ~ 7n(o'') < %{<^m) ~ Ini'^^) to obtain 

2 " 
n ^-^ 

k=l 

Analogously to what was done for the drift, we can cancel on both sides the common 
term ||cr^lA<= ||^- This yields 

2 " 

¥la - ^aWI < Ik™ - ^ill' + 2i>n(^™ - ^l) + - E^-^™ - <ylKXkA)RkA. (23) 

fc=i 

We obtain the following result. 

Proposition 2. Let A = A„ be such that A„ 0, nA„/ In^(n) +00 when n +00. 
Suppose that Assumptions 1-3 hold and consider a model Sm in the collection [DP] with 
Nn = o{n/S./\D^{n)), where Nn is defined in {Ti.2)- Then the estimator of defined 
by (20) is such that 

IE(|km -ctaIL) < 77ri||cr„, -cr^ll +A hA A H , (24) 

n n 

where a\ = ct^Iaj CLnd K , K' , K" are positive constants. 

Let us make some comments on the rates of convergence. If cr\ belongs to a ball 
of some Besov space, say a\ G Sq^2.oo([0, 1]), and |k^||Q.2,oo < A, with r + 1 > a, then 
Iki - f^mf < C{a,L)D-^''. Therefore, if we choose An = we obtain 

E(|k^ - aill^) < C(a,L)n'2a/(2a+i) ^ ^-,^2 ^ ^^b) 
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The first term the optimal nonparametric rate proved by Hoffmann [25]. 

Moreover, under the standard condition = o(l/n), the last two terms are 0{l/n), 
that is, negligible with respect to 

4.3. Adaptive diffusion coefficient estimator 

As previously, the second step is to ensure an automatic selection of Dm , which does not 
use any knowledge on . This selection is done by 

m = arg min [%{am) + pcn{m)]. (26) 

We denote by af^^ the resulting estimator and we need to determine the penalty perl as 
for b. For simplicity, we use the same notation m in (26) as in (17) although they are 
different. We can prove the following theorem. 

Theorem 2. Let A = A,i be such that 0, nA,i/ln^(n) +oo when n +oo. 

Suppose that Assumptions 1-3 hold. Consider the nested collection of models [DP] with 
maximal dimension Nn < nA/ln^(n). Let 

j5cii(m) > kaf — —, (27) 
n 

where k is a universal constant. Then, the estimator ct^^ of cr^ with rh defined by (26) is 
such that 

Ei\\al-al\\l)<C inf QWl - + ^{m)) + K' + — . (28) 
meM„ n 

As for the drift, it is possible to choose pen(TO) = kafD„i/n, but this is not what is 
done in practice. Moreover, making such a choice, it follows from (28) that the adaptive 
estimator automatically realizes the bias-variance compromise. Whenever a\ belongs to 
some Besov baU (see (25)), if nA^ = o(l) and r + 1 > a, tT?j achieves the optimal cor- 
responding nonparametric rate 7j-2a/(2a+i)^ without logarithmic loss, contrary to Hoff- 
mann's adaptive estimator (see Hoffmann [25], page 160, Theorem 6). As mentioned for 
b, Proposition 2 and Theorem 2 hold for the basis [W] under the same assumptions on 
Nn. For [T], A^„ =o((?iA)i/Vln(n)) is needed. 

5. Examples and numerical simulation results 

In this section, we consider examples of diffusions and implement the estimation algo- 
rithms on simulated data. To simulate sample paths of diffusion, we use the retrospective 
exact simulation algorithms proposed by Beskos et al. [10] and Beskos and Roberts [9]. 
Contrary to the Euler scheme, these algorithms produce exact simulation of diffusions 
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under some assumptions on the drift and diffusion coefficient. Therefore, we choose our 
examples in order to meet these conditions in addition with our set of assumptions. For 
the sake of simphcity, we focus on models that can be simulated by the simplest algo- 
rithm of Beskos et al. [10], which is called EAl. More precisely, consider a diffusion model 
given by the stochastic differential equation 

dXt = b{Xt)dt + a{Xt)dWt. (29) 

We assume that there is a one-to-one mapping F on M such that £^t ~ F{^t) satisfies 

d^t^a{^t)dt + dWt. (30) 

To produce an exact realization of the random variable given that x, the exact 
algorithm EAl requires that a be C"'^, and + a' be bounded from below and above. 
Moreover, setting A(^) = /^a(u)du, the function 

M0=cxp(A(e)-(e-.x)V2A) (31) 

must be integrablc on M, and an exact realization of a random variable with den- 
sity proportional to h must be possible. Provided that the process (^t) admits a sta- 
tionary distribution that it may also be possible to simulate, using the Markov prop- 
erty, the algorithm can therefore produce an exact realization of a discrete sample 
(^feA, ^ = 0, 1, . . . , 77 -|- 1) in the stationary regime. We deduce an exact realization of 
{XkA = F-^{^kA),k = 0,...,n + l). 

In all examples, we estimate the drift function a(^) and the constant 1 for models like 
(30) or both the drift b{x) and the diffusion coefficient a'^{x) for models like (29). Let 
us note that Assumptions 1-3 are fulfilled for all the models (^t) below. For the models 
(Xt), the ergodicity and the exponential /3-mixing property hold. 

5.1. Examples of diffusions 

5.1.1. Family 1 

First, we consider (29) with 

h{x) = -6x, cr(a;) =c(H-a;^)i/2_ (33) 

Standard computations of the scale and speed densities show that the model is positive 
recurrent for 6 4- c^/2 > 0. In this case, its stationary distribution has density 

^(^^^^ (l + ^2)i+e/c2- 

If Xq = rj has distribution tt{x) dx, then, setting v ~\ + 2Q l<? , v^l"^ r\ has Student distri- 
bution i(y) which can be easily simulated. 
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Figure 1. dXt = -(6'/c + c/2) tanh(cXt) + dVKt, n = 5000, A = l/20, 6* = 6, c = 2. Dotted line, 
true; solid line, estimate. The algorithm selects (p, r) equal to (0, 1) for the drift, (0,2) for . 



We now consider Fi{x) = l/(c(l + x^y^^)dx = argsinh(x)/c. By the Ito formula, 
^t=Fi{Xt) satisfies (30) with 

a(0 = -(^/c + c/2)tanh(cC). (33) 

Assumptions 1-3 hold for (^t) with = Fi{Xo). Moreover, 

+ a'iO = {{S/c + c/2f + 6 + 0^/2} tanh2(cC) -{6 + 0^2) 

is bounded from below and above. And 

A{£_) = a{u) du = -(1/2 + e/c"^) log(cosh(cO) < 0, 
Jo 

so that exp(y4.(^)) < 1. Therefore, function (31) is integrable for all x and, by a sim- 
ple rejection method, we can produce a realization of a random variable with density 
proportional to h{^) using a random variable with density J\f{x,A). 

Note that model (29) satisfies Assumptions 1-3 except that cr^ (x) is not bounded from 
above. Nevertheless, since Xt =F^^{£,t) =sinh(c^t), the process {Xt) is exponentially 
/3-mixing. The upper bound crj that appears explicitly in the penalty function must be 
replaced by an estimated upper bound. 
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5.1.2. Family 2 

For the second family of models, we start with an equation of type (30) where the drift 
is now (see Barndorff-Nielscn [7]) 

^(^) = -^ (1 + ^2^)1/2 - (34) 
The model for (^t) is positive recurrent on R for > 0. Its stationary distribution is given 

by 

^(Ode^exp (^-2^(1 + 02^2)1/2^ =cxp(^-2^^exp(^(0), 

where exp(/3(^) < 1 so that a random variable with distribution 7r(^)d^ can be sim- 
ulated by simple rejection method using a double exponential variable with distribu- 
tion proportional to exp(— 20|^|/c). The conditions required to perform an exact sim- 
ulation of (^f) hold. More precisely, + a' is bounded from below and above and 
A{£) = /Q^a(u)du = -(6l/c2)(l +c2^2)i/2^ ^<iVLC& exp(A(^)) < 1, (31) is integrable and 
we can produce a realization of a random variable with density proportional to (31). 
Lastly, Assumptions 1-3 also hold for this model. 




Figure 2. AXt = -QXt dt -h c(l + Xlfl"^ AWt, n = 5000, A = 1/20, f = 6, c = 2. Dotted line, 
true; solid line, estimate. The algorithm selects (p,r) equal to (0, 1) for the drift, (0,2) for . 
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Figure 3. dXt = -[9 + / {2cosh(Xt))]ismh{Xt)/ cosh^ (Xt)) dt + {c/ cosh{Xt)) dWt, n ^ 5000, 
A = 1/20, 6 — 3, c = 2. Dotted line, true; solid line, estimate. The algorithm selects {p,r) equal 
to (0,2) for the drift, (0,3) for a^. 



We now consider Xt = ^2(^4) = argsinh(c^t), which satisfies a stochastic difi^erential 
equation with coefficients 

, / \ f \ sinh(a;) , , c ,„^, 

h{x)^-[e+- — - cr X = — -. (35) 

^ ' \ 2cosh(x)y cosh2(a;) ^ ' cosh(a;) ^ ' 

The process {Xt) is exponentially /3-niixing as (^t). The diffusion coefficient (t{x) is not 
bounded from below but has an upper bound. 

To obtain a different shape for the diffusion coefficient, showing two bumps, we consider 
Xt = G{£,t) = argsinh(^t ^ 5) + argsinh(^( + 5) where (^t) is as in (30)-(34). The function 
G(-) is invertible and its inverse has the explicit expression 

G'\x) = , \ , , [49sinh^(a:) + 100 + cosh(a;)(sinh^(.T) - 100)]^^^ 
2-'-/^ smh(a;) 

The diffusion coefficient of (Xt) is given by 

"^^^^ " (l + (G-i(x)- 5)2)1/2 + (i + (G'-i(x)+ 5)2)1/2 ■ (36) 
The drift is given by b{x) = G' {G-\x))a{G'^ {x)) + ^G"{G-\x)). 



530 



F. Comte, V. Genon-Catalot and Y. Rozenholc 




Figure 4. Two paths for the two-bumps difFusion coefficient model Xt = G{£^t), d^t — 
-9^t/il + c^(.ty^^dt + dWt, G(a;) =argsinh(2;-5) + argsinh(a: + 5), n = 5000, A = f /20, 6 = 1, 
c= 10. Dotted line, true; solid line, estimate. The algorithm selects (p,r) equal to (0,3) (above) 
and (2,0) (below) for the drift, (0,6) (above) and (1,3) (below) for . 
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We do not give here a complete Monte Carlo study but wc illustrate how the algorithm 
works and what kind of estimate it delivers visually. 

We consider the regular collection [DP] (see Section 2.2). The algorithm minimizes the 
mean square contrast and selects the space of approximation in the sense that it selects p 
and r for integers p and r such that 2P{r + 1) < Nn < nA/ In^(n) and r S {0, 1, . . . , rmax}- 
Note that the degree is global in the sense that it is the same on all the intervals of the 
subdivision. We take r^ax = 9 in practice. Moreover, additive (but negligible) correcting 
terms are classically involved in the penalty (see Comte and Rozenholc [17]). Such terms 
avoid undcrpcnalization and are in accordance with the fact that the theorems provide 
lower bounds for the penalty. The correcting terms are asymptotically negligible so they 
do not affect the rate of convergence. Thus, both penalties contain additional logarithmic 
terms which have been calibrated in other contexts by intensive simulation experiments 
(see Comte and Rozenholc [16, 17]). 

The constant k in both penalties pen(TO) and j5efi(TO) has been set equal to 4. 

We retain the idea that the adequate term in the penalty was E((t^(Xo))/A for b 
and K{a^{Xo)) for cr^, instead of those obtained {af/A and af, respectively). Indeed, in 
classical regression models, the corresponding coefficient is the variance of the noise. This 
variance is usually unknown and replaced by a rough estimate. Therefore, in penalties, 
(Ti/A and trf are replaced by empirical variances computed using initial estimators b, 
cP' chosen in the collection and corresponding to a space with medium dimension: o\l A. 
for pen(-) is replaced s\ =7ri(6) (see (10)); and a\ for the other penalty is replaced by 
s1-7«('7') (sec (20)). 

Finally, for m = (p, r), the penalties pen(m) for i = \ and peh(m) for i = 2 are given 

by 

4^2P(r+l+ln2-5(r + l)). 
n 

Figures 1-4 illustrate our simulation results. We have plotted the data points 
{Xk[^,Yu/\) (see (9)) and (XfeA,C/fcA) (see (21)), the true functions b and and the 
estimated functions based on 95% of data points. Parameters have been chosen in the 
admissible range of crgodicity. The sample size n = 5000 and the step size A = 1/20 are 
in accordance with the asymptotic context (large n and small A) and may be relevant 
for applications in finance. It is clear that the estimated functions correspond very well 
to the true ones. 

The simulation of sample paths does not rely on Eulcr schemes as in the estimation 
method. Therefore, the data simulation method is disconnected with the estimation pro- 
cedures and cannot be suspected of being favourable to our estimation algorithm. 
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6.1. Proof of Proposition 1 

We recall that for ^-supported functions, = J^t^ {x)Tr{x)dx. Starting from (13)- 
(14), we obtain 



\\bm~bA\\l<\\brn~bA\\l + 2\\bm~brn\k SUp |l/„(t) 



+ 2||&„-6„ 



1 



teS',„.||t|U=i 



fe=i 



fcA 



{b{X,)-b{XkA))ds 



1/2 



1 



<\\brn~bA\\l + -\\b^-b,n\\l + S sup K(i)]^ 

° tes„, ||t|U=i 



oll^"-&™ll«- 



lA2 



E 

k=l 



{b{Xs)-b{XkA))ds 



kA 



Because the L^-norm, || • ||^, and the empirical norm (12) are not equivalent, we must 
introduce a set on which they are and then prove that this set has small probability. Let 
us define (see (6)) 



(37) 



On r!„, \\b^-b^\\l < 2||6„,-6„J|2 and \\b„,~b,Jl < 2(||6„ - + ||6,„ - ). Hence, 
some elementary computations yield: 



-\\bm~bA\\lln„ 



<-;\\bm-bA\\l + S sup [„„{t)f + — 
4 tes„.,||t|i =1 nA^ 



" / ^(fc+l)A 



E 

k=l 



{b{X,)-biXkA))ds 



kA 



Now, using (3), we obtain 

r(fc+l)A 



E 



^(fc+l)A \^ i'{k+l)A 

/ ibiX,)-b{XkA))ds] <A EmX,)-biXkA)?]ds<c'A^ 

JkA J JkA 



Consequently, 



H\\bry^-bA\\ltn,^<^br,,~bA\\l + i2^ sup K (t)]^ + 32c'A. (38) 



tes„., ||t||„=i 
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Next, using (5), (7)-(9) and (13), it is easy to see that, since II^Utt = 1 \\t\\^ < I/ttq, 

E( sup K(i)]2)<— E( sup [iynit)]A < — Y E[iyl{ipx)] 

teS™, ||t|U = l / ^0 VtGS™,||t||<l / I'D ^■^^ 



<T.2r) " r /•C^+i)'^ 1 
Jo J 



Gathering bounds, and using the upper bound tti defined in (5), we obtain 

E(||S,„ - bAllltnJ < 77ri||6™ - + gaioM^!!^:^ + 32c'A. 

TTquA 

Now, all that remains is to deal with fij^. Since |[6„i — 5^11^1 < ||^m ^ ^llru it is enough 
to check that E(||6„i — 6||^jlf2^) < c/n. Write the regression model as Y^a = b{XkA) + £kA 
with 

-, ^(fc+l)A Ak+1)A 

£kA = -r [biXs)-b{XkA)]ds + - <jiX,)dW,. 

^ JkA ^ JkA 

Recall that Il„i denotes the orthogonal projection (with respect to the inner product of 
R") onto the subspace {(^(Xa), • ■ • ,t(^«A))',t G S*™} of R". We have (6„(Xa), . . . ,6™(X„a))' ^ 
n,„y, where Y = {Ya, • ■ • , I^a)'- Using the same notation for the function t and the vec- 
tor {t{XA), ■ ■ ■ ,t{XnA)y, we see that 



\b - Lwi = \\b- u^Mi + mrnswi < ml + n~'j2^', 



2 

A- 



1=1 



Therefore, 



1 " 

E(|[6 - b,n\\lln^J < E(||6||2i^ ) + _ VE(4^1oc 



n 

k=l 



<(Ei/2(fe4(Xo))+Ei/2(ei))pV2(f^c)_ 

By Assumption l(ii) we have E{b'^{Xo)) < c(l +E(X^)) = K. With the Burholder-Davis- 
Gundy inequality, we find 

E(4) ^^'{^l^ - HXa))'] ds + ^E (^J^^ a\X,) As^ 
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Under Assumptions 1, 2(i) and 3 and inequality (3), we obtain E(£^) < C(l + <jf/A'^) := 
C"/A^. The next lemma enables us to complete the proof. 

Lemma 1. Let fi„ be defined by (37) and assume that nA„/ln^(n,) —>■ +oo when 
n +00. Then, if Nn < 0(nA„/ln^(n)) for collections [DP] and [W], and if N„ < 
0((riA„)i/Vln(n-)) for collection [T], then 

mn)<^- (39) 

The proof of Lemma 1 is given in Section 7. 

Now, we gather all terms and use (39) to obtain (15). 



6.2. Proof of Theorem 1 

The proof relics on the following Bernstein-type inequality: 

Lemma 2. Under the assumptions of Theorem 1, for any positive numbers e and v, we 
have 

Cn \ ^ TiAe^ 

5]i(^fcA)^fcA>rie,||i||^<«2j <cxp 



Proof. We use the fact that ^(^fcA)^feA can be written as a stochastic integral. 

Consider the process 

n 

Hu = ^« = X/ l[*-'A,(fc+l)A[(w)i(-'^/cA)o'(X„), 
k=l 

which satisfies i/^ < ^ill^IlL ^o^' w > 0. Then, writing Ms — HudW^ we obtain 
that 



n ^(fe+l)A 
M(„+i)A= Vt(XfeA) / <T{Xs)dWs, 

(M)(„+l)A=Vt'(XfcA) / <T^{Xs)ds. 



Moreover, {M)s = H'^du < r7,cr^A||t||^, for all s > 0, so that (Ms) and exp(AA/s - 
}?{M)s/2) are martingales with respect to the filtration J^s = o'iXujU < s). Therefore, 
for all s > 0, c> 0, d > 0, A > 0, 



¥{Ms > c, {M)s <d) < P^cxp^AM, - y (A/)^^ > cxp^Ac 
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< exp — Ac a 



Therefore, 



2 



> c, {M)s < d) < inf cxp ( - ( Ac - y d j j = exp ( - 



Finally, 



J2 ti^kA)ZkA > ne, \\t\\l <vA^ P(M(„+i)A > nAe, (M)(„+i)a < nv^ajA) 

\k=l J 

< exp -- — =exp 



Now we turn to the proof of Theorem 1. As in the proof of Proposition 1, we have to 
split \\brn - BaWI = \\bfn - bA\\ltn„ + W^ni - • For the treatment of n^, the end 

of the proof of Proposition 1 can be used. 

We now focus on what happens on Q^. From the definition of b,n, we have, for all 
m G A4n, 7„(6m) + pcn(TO) < 7„(6,„) + pen(TO). We proceed as in the proof of Proposition 
1 with the additional penalty terms (see (38)) and obtain 

E(||L-foA||^loJ<77ri||5„-6^||2 + 4pen(m) + 32Ef sup K(t)]'ln„) 

- 4E(pen(m)) + 32c'A. 

The main problem here is to control the suprcmum of v^ify on a random ball (which 
depends on the random in). This is done by using the martingale property of v^ify. 
Let us introduce the notation 

Gm{m')^ sup kn(i)|- 

tGS„ + S,„/, ||t||„ = l 

Now, we plug in a hmction p{m,m'), which will in turn fix the penalty: 
G'™(m)lo„ < [{Gl^{m) ~ p{m,Th))laJ+ +p{m,m) 

m'eMn 



And pen is chosen such that 8p(m,m') < pen(TO) + pen(TO'). More precisely, the next 
proposition determines the choice of p{m,m.'). 
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Proposition 3. Under the assumptions of Theorem 1, there exists a numerical constant 
Ki such that, for p{m,m') = Kia\(Dm + Dm') / {n/^), we have 



E[(G^(m') -p(TO,m'))loJ+ <ca — — . 

Proof of Proposition 3. The result of Proposition 3 follows from the inequality of 
Lemma 2 by the L^-chaining technique used in Baraud et al. [5] (see their Section 7, 
pages 44-47, Lemma 7.1, with = cti/A). □ 

It is easy to see that the result of Theorem 1 follows from Proposition 3 with pen(m) > 
KalDm/ (nA) and k = 8ki. 



6.3. Proof of Proposition 2 

First, we prove that 

(40) 



With the obvious convention, let RkA ~ -R^a + -^Ia + ^fcA ^° thai (40) holds if 
E[(i?^^)2] < KiA'^ for i = 1, 2, 3. Using Assumption 1, 

<A''E{b\Xo))<cA\ 

We also have 

E[(4a) ]<^(e^^ (fo(X,)-6(X,A))d,sj a(X,,)dM/.j j . 

Using (3), we obtain 

E[(i?i^i)^]<c'A2. 

Lastly, using Assumptions 1 and 2 and equation (22), 

1 / /•(fe+l)A \ a2 

E[(4a)']<^IE^^ ((fc + l)A-,s)V(^.)d.j <E(^2(Xo))— <c"A^ 
Therefore (40) is proved. 
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We now return to (23) and recall that fin is defined by (37). The treatment is similar 
to that for the drift estimator. On r2„, ||o-^ - cr^J|^ < 2||(T^ - cr,^„||^, 

W^rl - a\\\l < IK - alWl + l||af„ - a^Wl + 8 sup i>l{t) 

° tes„,||t|U=i 



n • 
fc=i 



Setting B„(0, 1) = {t G < 1} and B;^(0, 1) = {t G S*™, ||t|U < 1}, the following 

holds on ri„: 



.2 

'feA- 

fc=l 



Moreover, 



sup vl{t)\<-¥.[ sup i^^^^)") < 1 ^ iE(^2(^^)) 

AeA„j \A:=1 / 

< £oA^{i2E(a4(Xo)) +4AC,.,,}, 

where Gh,cr = E((cr'<T^)^(Xo)) + criE(&^(Xo)). Now using the condition on N^, we have 
/S.Dm/n < ANn/n < A^/ln^(n). This yields the first three terms of the right-hand side 
of (24). 

The treatment of ilj^ is the same as for b with the regression model UkA — c^(XfeA) + 
VkA, where rjkA = VkA + RkA- By standard inequalities, £(77^) < iC{A^E(fe^(Xo)) + 
E{<7^{Xq))}. Hence, £(77^) is bounded. Moreover, using Lemma 1, ¥{fl'^) < cjn^ . 

6.4. Proof of Theorem 2 

This proof follows the same lines as the proof of Theorem 1. We start with a Bernstein- 
type inequality. 

Lemma 3. Under the assumptions of Theorem 2, 

eV2 \ 



E*(^feA)T4^i^ >ne, \\t\\l<v' <cxp -O 



\k=l 



2ajv^ + e\\t\\ooCriv , 
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and 



V"" fe=i 



,x,\\t\\l<v^]<cxp{-Cnx). (41) 



The non-trivial link between the above two inequalities is enhanced by Birge and 
Massart [14], so we just prove the first. 

Proof of Lemma 3. First we note that 



+ 00 



E(e 



p=2 P- 



+ CX3 



<i+j2-\t{x,,^)m\v^'2n^^A). 

p=2 P- 

Next we apply successively the Holder inequality and the Burkholder-Davis-Gundy 
inequality with best constant (Proposition 4.2 of Barlow and Yor [6]). For a contin- 
uous martingale (Mt), with Aln = 0, for fc > 2, Afj* = supg<j \Ms\ satisfies ||A/*||fc < 
cfc"'^/^||(A'f)^/^||fc, with c a universal constant. And we obtain 



Ei\v^2n^nA)< 



2P-1 
Ap 



2P- 



E 



+ E 



(ti+1)A 



(n+l)A 
A 



2p 



JnA 



TnA 



F nA 



Therefore, 



|^„a) < 1 + Ti^iAualc^ritiX^AW. 



k=2 



E(e^ 

Using p^/p! < e^~^, we find 

oo 



k=2 



< 1 + C" 



^{Aualc^yt'iX^A) 



Now, let us set 



I - {Aualc'^e\\t\\, 
a = e{A(Tlc^Y and h = A(jlc^e\\t\\^. 
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Since for a: > 0, 1 + x < e^, we obtain, for all u such that bu <1, 

E(e-(^-)^ii' |^„a) < 1 + ^^ffi^ < expf ^^ffi^) 

1 — bu \ 1 — bu J 

This can also be written as 
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E( CXp( ut{Xn AjV^A 



1 - bu 

Therefore, iterating conditional expectations yields 



TnA 1 < 1. 



E 



< 1. 



exp<^ ^ Ut(XfcA)KA 
Then we deduce that 

ii,tii2<,2expO ^(^ut(XfcA)^,:A Y^b^jr 

cxp< 2^ I ut{XkA)V{: 



^g-rmeg(naM v )/(l-h«)jg 



<e-™'e^ 



'^'^ 1 - bu 



The inequality holds for any u such that bu <\. In particular, u = e/(2aw^ + e6) gives 
-ue + au2MV(l - = -(l/2)(eV(2aw2 + e6) and therefore 



□ 



As for brin we introduce the additional penalty terms and obtain that the risk satisfies 

Hh'i-'^\\\l^n„) < 77ri||a2„ + 4PH(to) + 32e( sup (£>„(0)'ln„ ) 

Vtes;; ,^(0,1) / 

- 4E{^{7ji)) + K' , (42) 
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where S7„^„,(0, 1) = {< G S'™ + S„^',\\t\\■, = 1}. Let us denote by 

G„(m')= sup 

the main quantity to be studied, where 

n 

fc=i 

define also 



n 

k=l 



As for the drift, we write 

HGlim)) < E[(G2^(m) -p(m,m))lnj+ +E(j5(m,m)) 

Now we have the foUowing statement. 

Proposition 4. Under the assumptions of Theorem 2, for 

„ , 4fD„ + D™' , <^lfD„,, + D,, 
p(m, m ) — ~ 



n TTo 
where k* is a numerical constant, we have 

E[(G^(m') -p(m,m'))lnj+ < caf . 

The resuh of Proposition 4 is obtained from incquahty (41) of Lemma 3 by a 
L^(7r) — L°° chaining technique. For a description of this method, in a more general 
setting, we refer to Propositions 2-4 in Comte ([15], page 282-287), to Theorem 5 in 
Birge and Massart [14] and to Proposition 7 and Theorems 8 and 9 in Barron et al. 
[8]. Note that there is a difference between Propositions 3 and 4 which comes from the 
additional term ||t|[oo appearing in Lemma 3. For this reason, we need to use the fact 

that II EAeA„ /3AV'A||oo/sup;,gA„ \Px \ < II E IV-aIIIoo < (?-max + l)AWl'd^^ fol' ('0a)agA„ 

an L^(7r)-orthonormal basis constructed by orthonormalisation of the {^\). This explains 
the additional term appearing in p{m,m'). 

Choosing perL(m) > kafD,n/n with k = 16k*, we deduce from (42), Proposition 4 and 
Dm < N„ < nA/ln{n) that 

E{\\al-a\\\l)<7n,\\<jl-a\r + 8^{m) + cat ^ ^ + , f, , 
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+ 64E sup iPi'Ht))')+K'A'+Ei\\al-al\\lln.^). 

The bound for E([|(T,'|j — cr^H^lfj^ ) is the same as that given in the end of the proof of 
Proposition 2. It is less than c/n provided that Nn < ?iA/hi^(n) for [DP] and [W] and 
N^<nA/lii^{n) for [T]. 

Since the spaces are ah contained in a space denoted by iS„ with dimension Nn bounded 
as right above, we have 

Ef sup {i)i^^){t)f)<-E( sup ii^i?Ht)f)<Ka,„'^l^<K'A\ 

\t£B^^^{0,l) / VtG5„,||t|| = l / I'D" 

The resuh of Theorem 2 follows. 



7. Proof of Lemma 1 

Using Baraud et al. [4]. we prove that, for all n and A > 0, 



P(17^)<2Ti/3x(g„A)+27i^exp -Co——- , (43) 

where Co is a constant depending on ttojTTi, g„ is an integer such that < n, and Ln{4>) 
is a quantity depending on the basis of the largest nesting space 5„ of the collection and 
is defined below. We recall that Nn = dim(5„). 

We first prove (43). We use Berbee's coupling method as in Proposition 5.1 of Viennet 
[32] and its proof. We assume that n = 2p„(7„. Then there exist random variables X*^^ 
i = 1, . . . ,n, satisfying the following properties: 

• For £ = l,...,p„, the random vectors Uis = (-'^[2(£-i)g„+i]A, ■ • ■ , -'^(2f-i)g„A)' and 

(^-i)g„+i]A' ■ ■ ■ ' ^(2f-i)9„A)' ^^'^^ ^^"^^ distribution, and so have the 
vectors U£^2 = (-'^[(2<'-i)q„+i]A, • ■ • ,-^2£g„A)' and = (-'^[*(2£-i)g„+i]A' ■ ■ ■ '^2£g„A)'■ 
. For ^ = 1, . . . ,p„, P{Ues ^ Uli) < PxiQuA) and P([7,,2 ^ UI2) < Px{qnA). 

• For each (5 G {1, 2}, the random vectors U* g, . . . ,U* g are independent. 

Let us define SI* = {X,a = X*^, i = 1, . . . , n}. We have P(r2fJ < P(17^ nn*)+ ¥{n*'=) and 
clearly 

Pin*') < 2pnPx{qn^) < nPxiQn^)- (44) 
Thus, (43) holds if we prove 



P(17^nf}*)<2iV^exp , (45) 
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where Ln{4>) is defined as follows. Let (iy5A)AeA„ be an L^(A, dx)-ortlionormal basis of 
Sn and, as in Baraud et al. [4], define the matrices 



1/2- 



B = (||</5A'PA'||oo)A,A'eA„xA„ 



A,A'eA„ X A„ 



A, A' 



Then we set Ln(4>) ~ nia'X{/9^(V^), where, for any symmetric matrix M = (A/ 
p{M) = sup^^^j^^^ ^2 <j Y.x,\' \a\\\ax' \ \Mx,v\- 

We now prove (45). Let ¥*{■) :=P(- nfl*). We use Baraud ([3], Claim 2 in Proposi- 
tion 4.2). Consider ?;„(t) = (1/n) Er=i[*(^iA) -E(t(X,A))], B^{0, 1) = {t G 5„, \\t\\^ < 1} 
and 5(0,1) = {te5„, \\t\\ < 1}. As, on A, tto < tt{x) <tti, 



sup \vn{t'^)\= sup 
t6B,(0,l) te5„/{o} 



< TTq ^ sup \Vn{t'^)\ 

te-B(o,i) 



Thus 



sup \v„{t^)\ >pa) < P* sup |u„(t^)| > TToPo 
tGS„(0,l) / \tGS(04) 



sup ^ 



|aAaA'|K'«('ii'A'^A')l >7ropo 



On the set {V(A,A') G A^, |w„(^a¥'a')I < 2yAA'(27ri.T)i/2 + 35^^,3;}^ we have 
sup ^ laACA'IIWn (<^A<^A')I < 2p{V){27:ixy/^ + 3p{B)x. 

By choosing x = (po7i'o)^/(167rii„((/))) and po = 1/2, and recall that ttq < tti, we obtain 
that 



sup ^ |aAaA'||wn(',5A</'A')l < PoTi'o = 



This leads to 



V*{ni)=F*{ sup \v„{t')\> 
\te-B,(o,i) 

< P*({V(A, A') G Al, \vniipxipy)\ > 2Vxv{1^xxf'^ + 3Baa'x}). 



The proof of (45) is then achieved by using the following claim, which is exactly Claim 
6 in the proof of Proposition 7 of Baraud et al. [4] . 
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Claim 1. Let (</'A)AeA„ be an lJ^{A,dx) orthonormal basis of Sn- Then, for all x>0 
and all integers q, I <q <n, 

P*(3(A,A') e Al/\vn{^xvx')\ > 2Fa,a'(2^ix)1/2 + 2Bx,yx) < 2N^exp(^-'^ 
Claim 1 implies that 



647ri g„L„(0) 

and thus (45) holds true. 

Again we refer to Baraud et al. [4] (see Lemma 2 in Section 10). It is proved there that, 
for [T], L„(0) < C4,Nl. For [W] and [DP] (see Sections 2.2 and 2.3 above), L„(<?!)) < C'^N,-,. 
We now use (43) to complete the proof of Lemma 1. By assumption, the diffusion process 
X is geometrically /3-mixing. So, for some constant 9^ (ix{qn^) < q^^i^^ , Provided that 
A = A„ satisfies ln(n)/(nA) 0, it is possible to take qn = [5 ln(n) / (0A)] + 1. This yields 

P(f^^)< A + 2n2cxpf-C^ 



The above constraint on A must be strengthened. Indeed, to ensure (39), we need 

nA 61n^(n) . tiA 
> ^, , i.e. Af„<Co-^— 



Nn- C'^ ' "In^(n) 
for [W] and [DP] . This requires nA/ln^(n) +oo. The result for [T] follows analogously. 
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